fix: coerce judge score drift#756
Conversation
Signed-off-by: schultzjack <schultzjack@users.noreply.github.com>
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
|
recheck |
Stale PR reminderThis PR has had failing checks for 7 days without activity. Failing checks: check Please push an update or leave a comment if you're still working on this. To prevent auto-close, add the |
|
Thanks for the contribution, this is a useful bit of tolerance around judge outputs. I reviewed the score coercion path and the generated Pydantic models. The implementation is nicely scoped and I don't see major blockers, but I'd like a small polish pass before merge. A couple of test cases would make the new fallback behavior clearer:
Also, please add a short comment above the bool guard in Focused tests and smoke checks passed locally. Once those small coverage/readability items are in, this looks good to merge from my side. |
Summary
Scope
This addresses the LLM-judge validation path discussed in #569. It intentionally leaves the broader LLM-structured schema coercion path unchanged.
Testing
Fixes #569